Amino acid modified polypeptides

ABSTRACT

Methods are disclosed for simplified recombinant production of fibrillar collagens. DNAs encoding fibrillar collagen monomers lacking the N propeptide, the C propeptide, or both propeptides are introduced into recombinant host cells and expressed. Trimeric collagen is recovered from the recombinant host cells.

TECHNICAL FIELD

The invention relates generally to the field of recombinant protein production, and particularly to the production of telopeptide collagen in recombinant host cells.

BACKGROUND ART

Collagen is the major protein component of bone, cartilage, skin and connective tissue in animals. Collagen in its native form is typically a rigid, rod-shaped molecule approximately 300 nm long and 1.5 nm in diameter. It is composed of three collagen polypeptide monomers which form a triple helix. Mature collagen monomers are characterized by a long midsection having the repeating sequence—Gly-X-Y, where X and Y are often proline or hydroxyproline, bounded at each end by the “telopeptide” regions, which constitute less than about 5% of the molecule. The telopeptide regions of the chains are typically responsible for the crosslinking between the chains (i.e., the formation of collagen fibrils), and for the immunogenicity of the protein. Collagen occurs naturally in a number of “types”, each having different physical properties. The most abundant types in mammals and birds are types I, II and III.

Mature collagen is formed by the association of three procollagen monomers which include “pro” domains at the amino and carboxy terminal ends of the polypeptides. The pro domains are cleaved from the assembled procollagen trimer to create mature, or “telopeptide” collagen. The telopeptide domains may be removed by chemical or enzymatic means to create “atelopeptide” collagen.

Interestingly, although there are a large number of different genes encoding for different procollagen monomers, only particular combinations are produced naturally. For example, skin fibroblasts synthesize 10 different procollagen monomers (proα1(I), proα1(III), proα1(V), proα2(I), proα2(V), proα3(V), proα1(VI), proα2(VI), proα3(VI) and proα1(VII)), but only 5 types of mature collagen are produced (types I, III, V, VI and VII).

Collagen has been utilized extensively in biological research as a substrate for in vitro cell culture. It has also been widely used as a component of biocompatible materials for use in prosthetic implants, sustained drug release matrices, artificial skin, and wound dressing and wound healing matrices.

Historically, collagen has been isolated from natural sources, such as bovine hide, cartilage or bones, and rat tails. Bones are usually dried, defatted, crushed, and demineralized to extract collagen, while cartilage and hide are typically minced and digested with proteolytic enzymes other than collagenase. As collagen is resistant to most proteolytic enzymes (except collagenase), this procedure can conveniently remove most of the contaminating protein that would otherwise be extracted along with the collagen. However, for medical use, species-matched collagen (e.g., human collagen for use in human subjects) is highly desirable in order to minimize the potential for immune response to the collagen material.

Human collagen may be purified from human sources such human placenta (see, for example, U.S. Pat. Nos. 5,002,071 and 5,428,022). Of course, the source material for human collagen is limited in supply and carries with it the risk of contamination by pathogens such as hepatitis virus and human immunodeficiency virus (HIV). Additionally, the material recovered from placenta is biased as to type and not entirely homogenous.

Collagen may also be produced by recombinant methods. For example, International Patent Application No. WO 97/14431 discloses methods for recombinant production of procollagen in yeast cells and U.S. Pat. No. 5,593,859 discloses the expression of procollagen genes in a variety of cell types. In general, the recombinant production of collagen requires a cloned DNA sequence encoding the appropriate procollagen monomer(s). The procollagen gene(s) is cloned into a vector containing the appropriate DNA sequences and signals for expression of the gene and the construct is introduced into the host cells. Optionally, genes expressing a prolyl-4-hydroxylase alpha subunit and a protein disulfide isomerase are also introduced into the host cells (these are the two subunits which make up prolyl-4-hydroxylase). Addition of the prolyl-4-hydroxylase leads to the conversion of some of the prolyl residues in the procollagen chains to hydroxyproline, which are important in interchain bonds of the triple helix and increase the thermal stability of the protein.

Alternately, recombinant collagen may be produced using transgenic technology. Constructs containing the desired collagen gene linked to the appropriate promoter/enhancer elements and processing signals are introduced into embryo cells by the formation of ES cell chimera, direct injection into oocytes, or any other appropriate technique. Transgenic production of recombinant collagen is particularly advantageous when the collagen is expressed in milk (i.e., by mammary cells), such as described in U.S. Pat. No. 5,667,839 to Berg. However, the production of transgenic animals for commercial production of collagen is a long and expensive process.

One difficulty of recombinant expression of collagen is the processing of the “pro” regions of procollagen monomers. It is widely accepted that folding of the three monomers to form the trimer begins in the carboxylpro-region (“C propeptide”) and that the C propeptide contains signals responsible for monomer selection (Bachinger et al., 1980, Eur. J. Biochem., 106:619-632; Bachinger et al., 1981, J. Biol. Chem. 256:13193-13199). One group has identified a region in the carboxy pro-region that they believe is necessary and sufficient for monomer selection (Bulleid et al., 1997, EMBO J. 16(22):6694-6701; Lees et al., 1997, EMBO J. 16(5):908-916; International Patent Application No. WO 97/08311; McLaughling et al., 1998, Matrix Biol. 16:369-377). Additionally, Lee et al. (1992, J Biol. Chem. 267(33):24126-24133) have shown that deletion of the N propeptide results in decreased secretion of human α1 pC collagen from CHL cells, but not Mov-13 cells. Accordingly, it is believed that the pro-regions must be retained for proper chain selection, alignment and folding of collagen produced by recombinant methods. In cells which normally produce collagens, specific proteolytic processing enzymes are produced which remove the N and C propeptides following the secretion of collagen. These enzymes are not present in cells which do not normally produce collagen (including commonly used recombinant host cells such as bacteria and yeast).

Ideally, the recombinant production of collagen is accomplished with a recombinant host cell system that has a high capacity and a relatively low cost (such as bacteria or yeast). Because bacteria and yeast do not normally produce the enzyme necessary for processing of the N and C propeptides, the propeptides must be removed after recovering the recombinant procollagen from the host cells. This can be accomplished by the use of pepsin, but processing with pepsin produces “ragged” ends that do not correspond to the ends of mature collagen secreted by mammalian cells which normally produce fibrillar collagen. Alternately, the enzymes which process the N and C propeptides can be produced and used to remove the propeptides. Any contamination of these preparations with other proteases will result in ragged ends. This added processing step increases the cost and decreases the convenience of production in these otherwise desirable host cell systems. Accordingly, there is a need in the art for simplified methods of producing genuine telopeptide collagen in high capacity systems.

DISCLOSURE OF THE INVENTION

The inventors have discovered new methods for the recombinant production of fibrillar collagens. The inventors have surprisingly and unexpectedly found that co-expression of DNA constructs encoding α1(I) and α2(I) collagen monomers lacking the N and C propeptides form heterotrimeric telopeptide collagen having the properties of genuine human type I collagen. Additionally, the inventors have found that co-expression in yeast of DNA constructs encoding a non-collagen signal sequence linked to α1(I) and α2(I) collagen monomers lacking the N propeptide results in a surprising increase in the production of “pC collagen” (procollagen trimer lacking the N propeptide). Additionally, co-expression in yeast of DNA constructs encoding a non-collagen signal sequence linked to α1(I) and α2(I) collagen monomers lacking the N and C propeptides results in a surprising increase in the production of type I collagen.

The methods of the instant invention may be used to produce any of the fibrillar collagens (e.g., types 1-III, V and XI) from any species, but are particularly useful for the production of recombinant human collagens for use in medical applications.

In one embodiment, the invention relates to methods for producing fibrillar collagen by culturing a recombinant host cell comprising a DNA encoding a fibrillar collagen monomer lacking a C propeptide sequence selection and alignment domain (SSAD) under conditions appropriate for expression of said DNA; and producing fibrillar collagen. The DNA may encode any of the fibrillar collagen monomers, such as α1(I), α2(I), α1(II), α1(III), α1(V), α2(V), α3(V), α1(XI), α2(XI), and α3(XI). Optionally, the DNA encoding the fibrillar collagen monomer lacking a C propeptide SSAD may also lack DNA encoding the N propeptide.

In another embodiment, the invention relates to methods for producing fibrillar collagen by culturing a recombinant yeast host cell comprising a DNA encoding a fibrillar collagen monomer lacking a N propeptide under conditions appropriate for expression of said DNA; and producing fibrillar collagen.

Another embodiment relates to recombinant host cells comprising an expression construct comprising a DNA encoding a fibrillar collagen monomer lacking a C propeptide sequence selection and alignment domain (SSAD). The DNA may encode any of the fibrillar collagen monomers, such as α1(I), α2(I), α1(II), α1(III), α1(V), α2(V), α3(V), α1(XI), α2(XI), and α3(XI). Optionally, the DNA encoding the fibrillar collagen monomer lacking a C propeptide SSAD may also lack DNA encoding the N propeptide.

In a further embodiment, the invention relates to trimeric collagen molecules which lack propeptide domains and lack any glycosylation.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an alignment of SSAD sequences, shown in single letter amino acid code, as identified by Lees et al. (1997, supra). Positions 1-12 and 21-23 are considered the essential positions in the SSAD.

FIG. 2 shows a map of shuttle vector plasmid Gp5432.

FIG. 3 shows the amino acid sequence of human preproα1(I) collagen posted to Genbank under accession number AF017178. The signal sequence (pre domain) is underlined. The first amino acid of the N telopeptide is marked with an “*”. The last amino acid of the C telopeptide is marked with a “#”.

FIG. 4 shows the amino acid sequence of human preproα2(I) collagen posted to Genbank under accession number Z74616. The signal sequence (pre domain) is underlined. The first amino acid of the N telopeptide is marked with an “*”. The last amino acid of the C telopeptide is marked with a “#”.

FIG. 5 shows a half-tone reproduction of a western blot demonstrating results from a thermal stability protease assay. Lanes labeled HSF are samples of type I collagen from medium conditioned by human skin fibroblasts. Lanes labeled CYT29 are collagen produced in yeast using an expression construct encoding preproHSAα1(I) and preproHSAα2(I) (preproHSAα1(I) and preproHSAα2(I) comprise the human serum albumin signal sequence plus four amino acids of the pro domain linked to a KEX2 cleavage site fused to the α1(I) and α2(I) telopeptide collagen monomers).

FIG. 6 shows a half-tone reproduction of a western blot demonstrating results from a mammalian collagenase digest of human skin fibroblast and yeast-derived collagen. Lanes labeled HSF are samples of type I collagen from human skin fibroblasts. Lanes labeled CYT29 are collagen produced in yeast using an expression construct encoding preproHSAα1(I) and preproHSAα2(I).

FIG. 7 shows a map of shuttle vector plasmid Gp5102.

BEST MODE FOR CARRYING OUT THE INVENTION

The methods of the instant invention generally involve the use of recombinant host cells comprising DNA expression constructs encoding the production of fibrillar collagen monomers lacking at least portions of one or both of the propeptides. The recombinant host cells are incubated under conditions appropriate for the expression of the constructs, and trimeric telopeptide collagen is recovered.

Definitions

As used herein, the term “collagen” refers to a family of homotrimeric and heterotrimeric proteins comprised of collagen monomers. There are a multitude of known collagens (at least 19 types) which serve a variety of functions in the body. There are an even greater number of collagen monomers, each encoded by a separate gene, that are necessary to make the different collagens. The most common collagens are types I, II, and III. Collagen molecules contain large areas of helical structure, wherein the three collagen monomers form a triple helix. The regions of the collagen monomers in the helical areas of the collagen molecule generally have the sequence GXY, where G is glycine and X and Y are any amino acid, although most commonly X and Y are proline and/or hydroxyproline. Hydroxyproline is formed from proline by the action of prolyl-4-hydroxylase.

As used herein, the term “fibrillar collagen” means a collagen which can normally form collagen fibrils. The fibrillar collagens are collagen types I-III, V, and XI. The collagen monomers that make up the fibrillar collagens contain “telopeptide” regions at the amino (N) and carboxy (C) terminal ends of the monomers which are non-helical in the collagen trimer. These collagens self-assemble into fibrils with the C-terminal end of the helical domain and the C propeptide of one collagen triple helix overlapping with the N telopeptide and the N-terminal end of the triple helical domain of an adjacent collagen molecule. The monomers that make up the fibrillar collagens are made as preproproteins, including an N-terminal secretion signal sequence and N and C-terminal propeptide domains. The signal sequence is normally cleaved by signal peptidase, as with most secreted proteins, and the propeptides are removed by specific proteolytic processing enzymes after association, folding and secretion of trimeric collagen. The term fibrillar collagen encompasses both native (i.e., naturally occurring) and variant fibrillar collagens (i.e., fibrillar collagens with one or more alterations in the sequence of one or more of the fibrillar collagen monomers).

The term “sequence selection and alignment domain” or “SSAD” refers to a portion of the C propeptide of fibrillar collagens identified by Lees et al. (1997, supra) as responsible for chain selection and alignment. SSAD sequences for α1(I), α2(I), α1(II), α1(III), α1(V), α2(V), α1(XI), and α2(XI) have been identified in Lees et al. and are shown in FIG. 1. Only positions 1-12 and 21-23 of the sequences shown in FIG. 1 are considered part of the SSAD. SSADs from other fibrillar collagen monomers can easily be identified in the C propeptide of fibrillar collagen monomers by sequence similarity alignment with the SSADs shown in FIG. 1.

The term “DNA encoding a fibrillar collagen monomer”, as used herein, means a DNA sequence which encodes a collagen monomer that is a component of a fibrillar collagen and which lacks the N propeptide domain, the SSAD, or both. cDNAs encoding fibrillar collagen monomers have been identified, cloned and sequenced, and are readily available to the research community through Genbank and other DNA sequence depositories. Due to the large size of the collagen monomers, the primary source of sequence information is cloned DNA sequence. By conceptual translation, the amino acid sequence of the fibrillar collagen monomers can be deduced. A DNA encoding a fibrillar collagen monomer is any DNA sequence that encodes the amino acid sequence of a fibrillar collagen monomer. Due to the degeneracy of the DNA code, a large number of different DNA sequences will be useful for the expression of any given fibrillar collagen monomer. Additionally, due to codon usage bias, the DNAs useful in the instant invention may be selected to be particularly advantageous for use in particular host cell (e.g., for use in S. cerevisiae, DNAs encoding fibrillar collagen monomers may be selected or synthesized which utilize codons that are preferred in S. cerevisiae).

DNA encoding any collagen monomer that is a component of fibrillar collagen may be useful in the methods of the instant invention. Particularly preferred collagen monomers are α1(I), α2(I), α1(II), α1(II), α1(V), α2(V), α3(V), α1(XI), α2(XI), and α3(XI), more preferably the human forms of α1(I), α2(I), α1(II), α1(III), α1(V), α2(V), α3(V), α1(XI), α2(XI), and α3(XI). The amino acid sequences for these proteins are available to the public (see, for example, Tromp et al., 1988, Biochem J. 253(3):919-922; Kuivaniemi et al., 1988, Biochem J. 252(3):633-640; Su et al., 1989, Nucleic Acid Res. 17(22):9473; Ala-Kokko et al., 1989, Biochem. J. 260(2):509-516; Takahara et al., 1991, J. Biol. Chem. 266(20): 13124-13129; Weil et al., 1987, Nucleic Acid Res. 15(1):181-198; Bernard et al., 1988, J. Biol. Chem. 263(32):17159-17166; Kimuraet al., 1989, J. Biol. Chem. 264(23):13910-13916; Mann et al., 1992, Biol. Chem. Hoppe Seyler 373:69-75; Sandell et al., 1991, J. Cell. Biol., 114:1307-1319). Additionally, deletion mutants of fibrillar collagens such as that described in Sieron et al. (1993, J. Biol. Chem. 268(28):21232-21237) and D period deletions such as described in Zafarullah et al. (1997, Matrix Biol. 16:245-253) and Arnold et al. (1997, Matrix Biol. 16:105-116) may also be produced by the method of the instant invention. The DNAs may be obtained by any method from any source known in the art, such as isolation from cDNA or genomic libraries, chemical synthesis, or amplification from any available template. Additionally, DNAs encoding variants may be produced by de novo synthesis or by modification of an existing DNA by any of the methods known in the art.

DNA encoding fibrillar collagen monomers for use in accordance with the instant invention lack sequences encoding the N propeptide, the C propeptide SSAD, or both. Lees et al. (1997, supra) teach that the SSAD domain is required for proper chain selection and association of collagen monomers. Preferably, DNAs encoding fibrillar collagen monomers lack the SSAD and also lack sequence encoding at least 50% of the total C propeptide domain, more preferably at least 75% of the total C propeptide domain, and even more preferably total 90% of the propeptide domain, and most preferably DNAs encoding fibrillar collagen monomers lack all of the C propeptide domain. Alternately, the DNA encoding fibrillar collagen monomers may lack sequence encoding part or all of the N propeptide domain. Preferred deletions of the sequence encoding the N propeptide domain include DNAs lacking sequence encoding 50%, 75%, 90% or all of the N propeptide. Additionally, DNA encoding fibrillar collagen monomers may lack sequence encoding portions of or the entirety of the N and C propeptides. Preferably, the DNA encoding fibrillar collagens for use in accordance with the instant invention lack sequences encoding both the N and C propeptides. The boundaries of the mature peptide and the N and C propeptides are well known in the art.

For use in the instant invention, the DNA encoding a fibrillar collagen monomer is cloned into an expression construct. General techniques for nucleic acid manipulation useful for the practice of the claimed invention are described generally, for example, in Sambrook et al., MOLECULAR CLONING: A LABORATORY MANUAL, Vols. 1-3 (Cold Spring Harbor Laboratory Press, 2 ed., (1989); or F. Ausubel et al. CURRENT PROTOCOLS IN MOLECULAR BIOLOGY (Green Publishing and Wiley-Interscience: New York, 1987) and periodic updates.

The exact details of the expression construct will vary according to the particular host cell that is to be used as well as to the desired characteristics of the expression system, as is well known in the art. For example, for production in S. cerevisiae, the DNA encoding a fibrillar collagen monomer is placed into operable linkage with a promoter that is operable in S. cerevisiae and which has the desired characteristics (e.g., inducible/derepressible or constituitive). Where bacterial host cells are utilized, promoters and promoter/operators such as the araB, trp, lac, gal, tac (a hybrid of the tip and lac promoter/operator), T7, and the like are useful in accordance with the instant invention. Acceptable promoters for use in the instant invention where the host cell is S. cerevisiae include, but are not limited to GAL1-10, PHO5, PGK1, GDP1, chelatin, PMA1, MET3, CUP1, GAP, TPI, MFα1 and MFα2, as well as the hybrid promoters PGK/α2, TPI/α2, GAP/GAL, PGK/GAL, GAP/ADH2, GAP/PHO5, ADH2PHO5, CYC1/GRE, and PGK/ARE, and other promoters active in S. cerevisiae as are known in the art. Where S. pombe is utilized as the host cell, promoters such as FBP1, NMT1, ADH1 and other promoters active in S. pombe as are known in the art, such as the human cytomegalovirus (hCMV) LTR. The AOX1 promoter is preferred when Pichia pastoris is the host cell, although other promoters known in the art, such as GAP and PGK are also acceptable. Further guidance with regard to features of expression constructs for yeast host cells may be found in, for example, Romanos et al. (1992, Yeast 8:423-488). When other eukaryotic cells are the desired host cell, any promoter active in the host cell may be utilized. For example, when the desired host cell is a mammalian cell line, the promoter may be a viral promoter/enhance (e.g., the herpes virus thymidine kinase (TK) promoter or a long terminal repeat (LTR), such as the LTR from cytomegalovirus (CMV), Rous sarcoma virus (RSV) or mouse mammary tumor virus (MMTV)) or a mammalian promoter, preferably an inducible promoter such as the metallothionein or glucocorticoid receptor promoters and the like.

Expression constructs may also include other DNA sequences appropriate for the intended host cell. For example, expression constructs for use in higher eukaryotic cell lines (e.g., vertebrate and insect cell lines) will include a poly-adenylation site and may include an intron (including signals for processing the intron), as the presence of an intron appears to increase mRNA export from the nucleus in many systems. Additionally, a secretion signal sequence operable in the host cell is normally included as part of the construct. The secretion signal sequence may be from a collagen monomer gene or from a non-collagen gene. In one preferred embodiment, the secretion signal sequence is a prepro sequence derived from human serum albumin which additionally contains a KEX2 protease processing site (MKWVTFISLLFLFSSAYSRGVFRR in single letter amino acid code, the signal peptidase site is between S and R, RGVF is derived from the HSA pro domain). If the secretion signal sequence is derived from a collagen monomer gene, it may be form a fibrillar collagen monomer (and may be derived from the same protein as the DNA encoding the fibrillar collagen monomer to be expressed or from a different fibrillar collagen monomer) or a non-fibrillar collagen monomer. Where the expression construct is intended for use in a prokaryotic cell, the expression construct should include a signal sequence which directs transport of the synthesized peptide into the periplasmic space.

Preferably, the expression construct will also comprise a means for selecting for host cells which contain the expression construct (a “selectable marker”). Selectable markers are well known in the art. For example, the selectable marker may be a resistance gene, such as a antibiotic resistance gene (e.g., the neo gene which confers resistance to the antibiotic gentamycin), or it may be a gene which complements an auxotrophy of the host cell. If the host cell is a yeast cell, the selectable marker is preferably a gene which complements an auxotrophy of the cell (for example, complementing genes useful in S. cerevisiae, P. pastoris and S. pombe include LEU2, TRP1, TRP1d, URA3, URA3d, HIS3, HIS4, ARG4, LEU2d), although antibiotic resistance markers such as SH BLE, which confers resistance to zeocin, may also be used. If the host cell is a prokaryotic or higher eukaryotic cell, the selectable marker is preferably an antibiotic resistance marker (e.g., neo^(r) or bla). Alternately, a separate selectable marker gene is not included in the expression vector, and the DNA encoding the fibrillar collagen monomer is used as a selectable marker (upon induction or derepression for controllable promoters, or after transfection for a constituitive promoter, fluorescence-activated cell sorting, FACS, may be used to select those cells which express the recombinant collagen). Preferably, the expression construct comprises a separate selectable marker gene.

The expression construct may also contain sequences which act as an “ARS” (autonomous replicating sequence) which will allow the expression construct to replicate in the host cell without being integrated into the host cell chromosome. Origins of replication for bacterial plasmids are well known. ARS for use in yeast cells are also well known (the 2μ origin of replication and operative fragments thereof, especially the full length sequence 2μ is preferred, see, for example International Patent Application No. WO 97/14431, although CEN-based plasmids and YACS are also useful in the instant invention) and ARS which act in higher mammalian cells have been recently described (see, for example, Pelletier et al., 1997, J. Cell. Biochem. 66(1):87-97)). Alternately, the expression construct may include DNA sequences which will direct or allow the integration of the construct into the host cell chromosome by homologous or site-directed recombination.

Where the host cell is a eukaryotic cell, it may be advantageous for the expression vector to be a “shuttle vector”, because manipulation of DNA is substantially more convenient in bacterial cells. A shuttle vector is one which carries the necessary signals to for manipulations in bacteria as well as the desired host cell. So, for example, the expression construct may also comprise an ARS (“ori”) which acts in prokaryotic cells as well as a selectable marker which is useful for selection of prokaryotic cells.

The host cells for use in the instant invention may be any convenient host cell, including bacterial, yeast, and eukaryotic cells. Yeast and higher eukaryotic cells are preferred host cells. For yeast host cells, Saccharomyces cerevisiae, Pichia pastoris, Hansenula polymorpha, Kluyveromyces lactis, Schwanniomyces occidentis and Yarrowia lipolytica strains are preferred. Of the higher eukaryotic cells, insect cells such as sf9 are preferred, as are mammalian cell lines which produce non-fibrillar collagens and do not produce any endogenous fibrillar collagens, such as HT-1080, 293, and NSO cells.

If the host cell does not have prolyl-4-hydroxylase activity (or has insufficient activity as is the case in insect cells), the host cell preferably is altered to produce prolyl-4-hydroxylase. This may be conveniently accomplished by introducing expression constructs coding for the expression of the subunits of prolyl-4-hydroxylase into the host cell. Prolyl-4-hydroxylase is a tetramer comprising two alpha subunits and two beta subunits (α₂β₂). The beta subunit is also known as protein disulfide isomerase (PDI). Expression constructs for prolyl-4-hydroxylase have been described for yeast (Vuorela et al., 1997, EMBO J. 16(22):6702-6712) and for insect cells (Lamberg et al., 1996, J. Biol. Chem. 271(20):11988-11995). In the case of a bacterial host cell, the expression construct for prolyl-4-hydroxylase will preferably incorporate a translocation signal to direct the transport of the subunits of the enzyme to the periplasmic space. Alternately, the prolyl-4-hydroxylase expression construct may be included in the fibrillar collagen monomer construct. In this arrangement, the expression construct may direct the production of separate messages for the fibrillar collagen monomer and the prolyl-4-hydroxylase subunits or it may direct the production of a polycistronic message. Separate messages are preferred for eukaryotic hosts, while the expression of a polycistronic message is preferred for prokaryotic hosts.

The expression construct is introduced into the host cells by any convenient method known to the art. For example, for yeast host cells, the construct may be introduced by electroporation, lithium acetate/PEG and other methods known in the art. Higher eukaryotes may be transformed by electroporation, microprojectile bombardment, calcium phosphate transfection, lipofection, or any other method known to the art. Bacterial host cells may be transfected by electroporation, calcium chloride-mediated transfection, or any other method known in the art.

After introduction of the expression construct into the host cell, host cells comprising the expression construct are normally selected on the basis of the selectable marker that is included in the expression vector. As will be apparent, the exact details of the selection process will depend on the identity of the selectable marker. If the selectable marker is an antibiotic resistance gene, the transfected host cell population is generally cultured in the presence of an antibiotic to which resistance is conferred by the selectable marker. The antibiotic eliminates those cells which are not resistant (i.e., those cells which do not carry the resistance gene) and allows the propagation of those host cells which carry the resistance gene (and presumably carry the rest of the expression construct as well). If the selectable marker is a gene which complements an auxotrophy of the host cells, then the transfected host cell population is cultured in the absence of the compound for the host cells are auxotrophic. Those cells which are able to propagate under these conditions carry the complementing gene and thus presumably carry the rest of the expression construct.

Host cells which pass the selection process may be “cloned” according to any method known in the art that is appropriate for the host cell. For microbial host cells such as yeast and bacteria, the selected cells may be plated on solid media under selection conditions, and single clones may be selected for further selection, characterization or use. Higher eukaryotic cells are generally further cloned by limiting dilution. This process may be carried out several times to ensure the stability of the expression construct within the host cell.

For production of trimeric collagen, the recombinant host cells comprising the expression construct are generally cultured to expand cell numbers. This expansion process may be carried out in any appropriate culturing apparatus known to the art. For yeast and bacterial cells, an apparatus as simple as a shaken culture flask may be used, although large scale culture is generally carried out in a fermenter. For insect cells, the culture is generally carried out in “spinner flasks” (culture vessels comprising a means for stirring the cells suspended in a liquid culture medium). For mammalian cell lines, the cells may be grown in simple culture plates or flasks, but as for the yeast and bacterial host cells, large scale culture is generally performed in a specially adapted apparatus, a variety of which are known in the art.

If the host cells comprise (either naturally or by introduction of the appropriate expression constructs) prolyl-4-hydroxylase, then vitamin C (ascorbic acid or one of its salts) may be added to the culture medium, although applicants have found ascorbate may not be necessary if the recombinant host cells are S. cerevisiae cells. If ascorbic acid is added, it is generally added to a concentration of between 10-200 μg/ml, preferably about 80 μg/ml. If ascorbate is to be added, it need not be added until the host cells begin producing recombinant collagen.

The recombinant host cells are cultured under conditions appropriate for the expression of the DNA encoding the fibrillar collagen monomer. If the expression construct utilizes a controllable expression system, the expression of the DNA encoding the fibrillar collagen monomers is induced or derepressed, as is appropriate for the particular expression construct. The exact method of inducing or derepressing the expression of the DNA encoding the fibrillar collagen monomers will depend on the properties of the particular expression construct used and the identity of the host cell, as will be apparent to one of skill in the art. Generally, for inducible promoters, a molecule which induces expression is added to the culture medium. For example, in yeast transfected with an expression vector utilizing the GAL1-10 promoter, galactose is added to the culture medium in the absence of dextrose. In bacteria utilizing an expression vector with the lac promoter, isopropyl-β-D-thiogalactopyranoside (IPTG) is added to the medium to derepress expression. For constituitive promoters, the cells are cultured in a medium providing the appropriate environment and sufficient nutrients to support the survival of the cells and the synthesis of the fibrillar collagen monomers.

Mature fibrillar collagen is produced by the recombinant host cells. Surprisingly, the fibrillar collagen monomers assemble into mature collagen trimers in the absence of the C propeptide.

Fibrillar collagen may then be recovered from the culture. The exact method of recovery of the collagen from the culture will depend on the host cell type and the expression construct. In many microbial host cells, the collagen will be trapped within the cell wall of the recombinant host cell, even though it has been transported out of the cytoplasm. In this instance, the host cells are preferably disrupted to recover the fibrillar collagen. Disruption may be accomplished by any means known in the art, including sonication, microfluidization, lysis in a french press or similar apparatus, disruption by vigorous agitation with glass beads, and the like. Alternately, in higher eukaryotic cells or microbial cells having mutations which render them “leaky”, the fibrillar collagen may be recovered by collection of the culture medium.

When DNAs encoding collagen monomers lacking the N and C propeptides are utilized in yeast or prokaryotic cells in accordance with the methods of the instant invention, non-glycosylated trimeric collagen having genuine N and C terminal ends (i.e., the N and C telopeptide ends found in fibrillar collagens secreted from mammalian cells that normally produce fibrillar collagen) is produced.

The patents, patent applications, and publications cited throughout the disclosure are incorporated herein by reference in their entirety.

EXAMPLES Example 1 Recombinant Production of Type I Telopeptide Collagen

Recombinant type I telopeptide collagen (α1 homotrimer and α1/α2 heterotrimer) was produced in S. cerevisiae host cells using expression constructs coding for human α1(I) and α2(I) collagen monomers. A number of different shuttle vectors were created, most based on Gp5432 (see FIG. 2 for a map of Gp5432) which contains DNA encoding the preprocollagen α1(I) and α2(I) monomers operably linked to the bidirectional GAL1-10 promoter (the sequences of preproα1(I) and preproα2(I) are shown in FIGS. 3 and 4, respectively). The PGK terminator (PGKt) is supplied at the 3′ end of the α2(I) sequence, while a terminator in the 2μ ARS (from the FLP gene) acts to terminate transcription of the α1(I) gene. Gp5432 also contains a yeast selectable marker (TRP1), an operable 1.6 kb fragment of the 2μ yeast origin, a bacterial ori, and a bacterial selectable marker (bla). Additionally, a construct was made based on Gp5102, which is very similar to Gp5432 but does not contain the α2(I) sequence or the PGKt (see FIG. 7 for a map of Gp5102). Constructs were created from Gp5432 which: (a) replaced the collagen secretion signal sequence (the “pre” domain) with a prepro domain from human serum albumin (HSA) which additionally contains a KEX2 protease processing site (MKWVTFISLLFLFSSAYSRGVFRR in single letter amino acid code (the KEX2 cleaves at the carboxy-end of RR), designated pGET462); (b) encoded pC α1(I) and pC α2(I) linked to the preproHSA/KEX2 sequence (designated pDO243880); and (c) and constructs with the α1(I) and α2(I) mature domain (i.e., the signal sequence and the N and C propeptides were deleted from the preproCOL1A1 and preproCOL1A2) linked to the preproHSA/KEX2 sequence or their native signal sequences (designated pDO248053 and pDO248098, respectively). pDO248010 was created from Gp5102, and encodes the α1(I) telopeptide sequence linked the preproHSA/KEX2 sequence.

The expression constructs were transformed into GY 5361 by electroporation. This host strain also contained a chromosomally-integrated expression construct encoding for the two subunits of chicken prolyl-4-hydroxylase. The alpha subunit (Bassuk et al., 1989, Proc. Natl. Acad. Sci. USA 86:7382-7386) and beta subunit, also known as PDI (Kao et al., 1988, Conn. Tiss. Res. 18:157-174), were cloned into an expression construct under the control of the bidirectional GAL1-10 promoter. The prolyl-4-hydroxylase construct also included the URA3 selectable marker and sequences from the TRP1 gene to allow integration by homologous recombination. Correct integrants were TRP1Δ.

After electroporation of GY5361 with 100 ng of plasmid DNA, transformants were selected on 2% agar plates containing 2% dextrose, 0.67% yeast nitrogen base lacking amino acids (YNB), 0.5% casamino acids by growing 3 days at 30° C. Transformants were grown overnight at 30° C. in media containing 2% dextrose, 0.67% YNB, 0.5% casamino acids to an OD₆₀₀ of 3 (approximately 1×10⁸ cells/ml). To induce collagen expression, the overnight cultures (in glucose-containing media) were dilute to OD₆₀₀ of approximately 0.05 in media containing 0.5% galactose, 0.5% dextrose, 0.67% YNB and 0.5% casamino acids, 1% sodium citrate, pH 6.5, 50 mM sodium ascorbate, 300 mM α-ketoglutarate, 100 mM ferric chloride (FeCl3), 100 mM glycine, 100 mM proline. Inductions were allowed to proceed for 48-96 hours at 30° C.

Cells were harvested by centrifugation, resuspended in 0.1 M tris HCl, pH 7.4, 0.4 M NaCl, 10 mM EDTA and lysed by vortexing in a centrifuge tube with glass beads. The beads and cellular debris were removed by centrifugation. Production of type I collagen was measured by immunoassay and protease sensitivity.

Protein yield was determined using a luminometric immunoassay. The assay utilizes a goat anti-type I collagen antibody commercially available from Biodesign International (Kennebunk, Me.) derivatized with either biotin or ruthenium chelate. Samples were diluted from 1:40 to 1:60 in “Matrix buffer” (100 mM PIPES, pH 6.8, and 1% w/v bovine serum albumin) and 25 μl samples were dispensed into tubes. 50 μl of an antibody working solution containing 1 μg/ml of ruthenium chelate conjugated antibody and 1.5 μg/ml biotin conjugated antibody in diluent (Matrix buffer plus 1.5% Tween-20) was added to each tube and the tubes were incubated for two hours at room temperature (approximately 20° C.). After the incubation, 25 μl of a 1 mg/ml solution of streptavidin-conjugated magnetic beads (in diluent) were added to each tube. The tubes were shaken or vortexed for 30 seconds. 200 μl of assay buffer (ORIGEN assay buffer, Igen, Inc., catalog number 402-050-01) was added to each tube and the tubes were mixed then placed in a ORIGEN analyzer (Igen, Inc., model #1100-1000). Results are shown below in Table 1. TABLE 1 Expression levels (μg collagen/ Strain Proteins encoded mg protein) CYT 30 preproCOLα1(I)/preproCOLa2(I) 0.68 ± 0.046 CYT 31 preproHSAproα1(I)/preproHSAproα2(I) 0.43 ± 0.015 CYT 32 preproHSApCα1(I)/preproHSAproα2(I) 1.21 ± 0.19  CYT 33 preproHSAα1(I)/preproHSAα2(I) 1.50 ± 0.038 CYT 44 preCOLα1(I)/preCOLα2(I) 0.13 ± 0.022

The The constructs expression α1 and α2 linked to their native signal sequences gave reduced expression, which is believed to be due to an alteration of the amino acid context at the signal peptidase cleavage site, which impairs signal peptide processing.

The collagens were also tested by proteolytic assays for thermal stability. Resistance to pepsin or trypsin/chymotrypsin was measured by the method of Bruckner et al. (1981, Anal. Biochem. 110:360-368). Basically, samples were incubated with protease at a series of temperatures (4, 20, 25, 30 and 35° C. for pepsin and 20, 25, 30 and 35° C. for trypsin/chymotrypsin). Type I collagen from human skin fibroblasts was incubated with pepsin or trypsin/chymotrypsin as a standard. Results were assayed by western blotting (Towbin et al., 1979, Proc. Natl. Acad. Sci. USA 76:4350-4354) using a rabbit anti-type I collagen antibody from Rockland, Inc. (Gilbertsville, Pa.), detected with a peroxidase-labeled goat anti-rabbit IgG (H+L) and visualized with a chemiluminescent reaction (ECL Western Blotting Kit, Amersham, Inc.). Assay results for α1(I)/α2(I) heterotrimer are shown in FIG. 5. α1(I) homotrimer had equivalent thermal stability as measured by this assay (data not shown).

In this assay, the triple helical portions of the collagen trimer are resistant to protease digestion. As the temperature is increased to the melting point of the triple helical region, the triple helical portions of the molecule become susceptible to proteolytic digestion. Monomeric collagen chains and improperly folded collagen monomers are highly susceptible to protease at low temperatures. These results show that the collagen produced by expression of DNA encoding α1(I) and α2(I) collagen lacking the N and C propeptides is approximately equivalent to human skin fibroblast type I collagen with regards to thermal stability and protease resistance.

The correct folding and register of the three monomers in the yeast-produced triple helical collagen was assayed by digestion with mammalian collagenase. Human skin fibroblast collagenase cleaves each of the three chains of collagen at a single point. Collagenase is highly sensitive to local structure and sequence at the cleavage site. If the molecule is improperly folded or the chains are folded out of register, collagenase will not cleave (Wu et al., 1990, Proc. Natl. Acad. Sci. USA 78:5888-5892). Samples were digested with purified human fibroblast collagenase in 0.05 M tris-HCl, pH 7.5, 0.15 M NaCl, 0.01 M CaCl₂ for 16 hours at 25° C. Prior to use in the assay, procollagenase was activated by treatment with 10 μg/ml trypsin at 25° C. for 30 minutes. The activation reaction was stopped by the addition of soybean trypsin inhibitor to a final concentration of 50 μg/ml. Results were displayed by western blotting using the same system as used for assaying protease resistance and are shown in FIG. 6. The data indicate that collagen produced by expression of DNA encoding α1(I) and α2(I) collagen lacking the N and C propeptides is correctly folded and the monomer chains are assembled in correct register.

The present invention has been detailed both by direct description and by example. Equivalents and modifications of the present invention will be apparent to those skilled in the art, and are encompassed within the scope of the invention. 

1. Purified human collagen or fragment thereof produced by a prokaryotic cell, the purified human collagen or fragment thereof being capable of providing a self aggregate, wherein the purified human collagen or fragment thereof has incorporated therein at least one amino acid which has not undergone post translational enzymatic modification selected from the group consisting of trans-4-hydroxyproline and 3-hydroxyproline, and the purified human collagen or fragment thereof having the at least one amino acid self aggregates to form an extracellular matrix.
 2. Human collagen or fragment thereof produced by a prokaryotic cell according to claim 1 wherein the human collagen or fragment thereof is encoded for by nucleic acid having the sequence shown in SEQ. ID. NO.
 19. 3. Human collagen or fragment thereof produced by a prokaryotic cell according to claim 1 wherein the human collagen or fragment thereof is encoded for by nucleic acid having the sequence shown in SEQ. ID. NO.
 39. 4. Human collagen or fragment thereof produced by a prokaryotic cell according to claim 1 wherein the human collagen or fragment thereof is encoded for by nucleic acid having the sequence shown in SEQ. ID. NO.
 43. 5. Human collagen or fragment thereof produced by a prokaryotic cell according to claim 1 wherein the human collagen or fragment thereof is encoded for by nucleic acid having the sequence shown in SEQ. ID. NO.
 45. 6. Purified human collagen or fragment thereof produced by a prokaryotic cell according to claim 1 wherein the purified collagen or fragment thereof is encoded for by nucleic acid having the sequence shown in SEQ. ID. NO.
 31. 7. Purified human collagen or fragment thereof produced by a prokaryotic cell, the purified human collagen or fragment thereof being capable of providing a self aggregate, wherein the purified human collagen or fragment thereof has incorporated therein at least one amino acid comprising 3-hydroxyproline which has not undergone post translational enzymatic modification, and the purified human collagen or fragment thereof having the at least one amino acid self aggregates to form an extracellular matrix.
 8. Human collagen or a fragment thereof produced by a prokaryotic cell wherein all naturally occurring sites which contain proline or hydroxyproline are occupied by trans-4-hydroxyproline or 3-hydroxyproline, wherein the collagen or fragment thereof have the property of self-aggregation.
 9. Human collagen or a fragment thereof according to claim 8, wherein the human collagen is human type I (α₂) collagen.
 10. Human collagen or a fragment thereof according to claim 9, wherein the collagen or fragment thereof is encoded by nucleic acid having the sequence shown in SEQ.ID.NO.
 31. 